The enormous amount of texts published daily by Internet users has fosteredthe development of methods to analyze this content in several natural languageprocessing areas, such as sentiment analysis. The main goal of this task is toclassify the polarity of a message. Even though many approaches have beenproposed for sentiment analysis, some of the most successful ones rely on theavailability of large annotated corpus, which is an expensive andtime-consuming process. In recent years, distant supervision has been used toobtain larger datasets. So, inspired by these techniques, in this paper weextend such approaches to incorporate popular graphic symbols used inelectronic messages, the emojis, in order to create a large sentiment corpusfor Portuguese. Trained on almost one million tweets, several models weretested in both same domain and cross-domain corpora. Our methods obtained verycompetitive results in five annotated corpora from mixed domains (Twitter andproduct reviews), which proves the domain-independent property of suchapproach. In addition, our results suggest that the combination of emoticonsand emojis is able to properly capture the sentiment of a message.
展开▼